hysop.backend.device.opencl.opencl_autotunable_kernel module

class hysop.backend.device.opencl.opencl_autotunable_kernel.OpenClAutotunableKernel(cl_env, typegen, build_opts, autotuner_config, **kwds)[source]

Bases: AutotunableKernel

autotune(name, force_verbose=False, force_debug=False, **extra_kwds)[source]

Autotune this kernel with given name and extra_kwds.

build_array_args(hardcode_arrays=False, **arrays)[source]
check_cache(required_cache_size)[source]

Check that required_cache_size bytes can fit in workgroup cache.

classmethod check_cl_env(cl_env, typegen)[source]

Check OpenClEnvironment.

classmethod check_field(field, *args, **kwds)[source]

Extend AutotunableKernel.check_field() by checking that the field is defined on backend OPENCL.

classmethod check_fields(*fields, **kwds)[source]

Extend AutotunableKernel.check_fields() by checking that all fields are defined on backend OPENCL.

compute_global_work_size(work, local_work_size, extra_parameters, extra_kwds)[source]

Compute aligned global_work_size from unaligned global_work_size and local_work_size. Input global_work_size may be None.

format_best_candidate(autotuner, file_basename, from_cache, name, extra_kwds, extra_parameters, work_size, work_load, global_work_size, local_work_size, args_mapping, args_list, program, kernel, kernel_name, kernel_src, kernel_statistics, src_hash, extra_kwds_hash, extra_kwds_hash_logs)[source]

Post treatment callback for autotuner results. Transform autotuner results in user friendly kernel wrappers.

Return a OpenClKernel with default_queue and default_args set to None. Only default_global_size, default_local_size, and args_mapping are set.

Use the build_launcher method to build OpenClKernelLauncher from this OpenClKernel.

format_oclgrind_isolation_argument(arg_name, arg_isol, arg_value)[source]
generate_hash_logs(kernel_name, hash_logs, force=False)[source]
abstract generate_kernel_src(global_work_size, local_work_size, extra_parameters, extra_kwds, tuning_mode, dry_run)[source]

Generate kernel source code as a string.

Returns opencl known arguments as a dictionnary for codegen capabilities.

generate_oclgrind_isolation_file(kernel, kernel_name, kernel_source, global_work_size, local_work_size, args_list, args_mapping, isolation_params, force=False)[source]
generate_source_file(kernel_name, kernel_src, force=False)[source]
make_array_granularity_index(granularity)[source]

Build array granularity index.

make_array_offset()[source]
make_array_strides(dim, hardcode_arrays)[source]

Build array strides in number of elements instead of bytes.

make_dt(dtype)[source]
make_parameter(param)[source]
max_device_work_dim()[source]

Maximum dimensions that specify the global and local work-item IDs.

max_device_work_group_size()[source]

Return the maximum number of work items allowed by the device.

max_device_work_item_sizes()[source]

Maximum number of work-items that can be specified in each dimension of the work-group.

classmethod to_vecn(vec, extend)[source]

Extend a npw.ndarray of size s<=16 to a compatible opencl vector size [1,2,3,4,8,16] with extra values.